Structure-inducing pre-training
نویسندگان
چکیده
Abstract Language model pre-training and the derived general-purpose methods have reshaped machine learning research. However, there remains considerable uncertainty regarding why improves performance of downstream tasks. This challenge is pronounced when using language in domains outside natural language. Here we investigate this problem by analysing how impose relational structure induced per-sample latent spaces—that is, what constraints do on distance or geometry between pre-trained embeddings samples. A comprehensive review reveals that question open, despite theoretical analyses showing importance understanding form structure. Based review, introduce a framework enables granular can be induced. We present analysis from first principles establish connection inductive bias fine-tuning performance. Empirical studies spanning three data modalities ten tasks confirm analyses, inform design novel consistent improvements over compelling suite methods.
منابع مشابه
Pre-training Attention Mechanisms
Recurrent neural networks with differentiable attention mechanisms have had success in generative and classification tasks. We show that the classification performance of such models can be enhanced by guiding a randomly initialized model to attend to salient regions of the input in early training iterations. We further show that, if explicit heuristics for guidance are unavailable, a model tha...
متن کاملKnowledge Transfer Pre-training
Pre-training is crucial for learning deep neural networks. Most of existing pre-training methods train simple models (e.g., restricted Boltzmann machines) and then stack them layer by layer to form the deep structure. This layerwise pre-training has found strong theoretical foundation and broad empirical support. However, it is not easy to employ such method to pre-train models without a clear ...
متن کاملInducing Structure for Vision and Language
The ability of children to solve complex learning problems during their first years of life has fascinated philosophers and researchers throughout time. While we are still far from completely understanding this process, there has been some interesting recent work in learning and language grounding (Regier, 2003; Roy, 2005; Yu et al., 2005). The computational models presented therein are capable...
متن کاملPre-Training CNNs Using Convolutional Autoencoders
Despite convolutional neural networks being the state of the art in almost all computer vision tasks, their training remains a difficult task. Unsupervised representation learning using a convolutional autoencoder can be used to initialize network weights and has been shown to improve test accuracy after training. We reproduce previous results using this approach and successfully apply it to th...
متن کاملPre-training of Hidden-Unit CRFs
In this paper, we apply the concept of pretraining to hidden-unit conditional random fields (HUCRFs) to enable learning on unlabeled data. We present a simple yet effective pre-training technique that learns to associate words with their clusters, which are obtained in an unsupervised manner. The learned parameters are then used to initialize the supervised learning process. We also propose a w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nature Machine Intelligence
سال: 2023
ISSN: ['2522-5839']
DOI: https://doi.org/10.1038/s42256-023-00647-z